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Abstract 

We examine the problem of covariance belief revision using a geometric approach. 
We exhibit an inner-product space where covariance matrices live naturally — a space 
of random real symmetric matrices. The inner-product on this space captures aspects 
of our beliefs about the relationship between covariance matrices of interest to us, pro- 
viding a structure rich enough for us to adjust beliefs about unknown matrices in the 
light of data such as sample covariance matrices, exploiting second-order exchangeabil- 
ity specifications. 

Keywords: BELIEF ADJUSTMENT; COVARIANCE ESTIMATION; EXCHANGEABIL- 
ITY; LINEAR BAYES; MATRIX INNER-PRODUCT; SUBJECTIVIST. 

1 Revising beliefs about covariance structures 

Quantifying relationships between variables is of fundamental importance in Bayesian anal- 
ysis. However, there are many difficulties associated even with learning about covariances. 
For example, it is often difficult to make prior covariance specifications, but it is usually 
even harder to make the statements about the uncertainty in these covariance statements 
which are required in order to learn about the covariance statements from data. Further, a 
covariance structure is more than just a collection of random quantities, so we should aim to 
analyse such structures in a space where they live naturally. In this paper, we develop and 
illustrate such an approach, based around a geometric representation for variance matrices 
and exploiting second-order exchangeability specifications for them. 
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2 Current approaches to covariance estimation 



Until recently, most authors have followed a Wishart conjugate prior approach (see for exam- 
ple, |(Jhen (1979| ) or [Hatf (1950| )). This approach, whilst tractable, places severe restrictions 
on the form of the prior distribution. More recently, a different approach has been proposed 
by [Leonard and Hsu (1992|) , who learn about the log of the covariance matrix using data. 
This solves the positivity problems associated with covariance revision, but makes prior belief 
specification more difficult. 

Brown, Le, and Zidek (1994), make further progress: working within a distributional 
Bayesian paradigm, they develop a reasonably flexible prior over the elements of a covari- 
ance structure, and offer interpretations for the parameters that one is required to specify. 
However, this work is still restricted to multivariate Normal likelihoods, and there is a weak 
restriction on the form of the mean structure for the data. 



3 Bayes linear methods 

The Bayes linear approach to subjective statistical inference makes expectation (rather than 
probability) primitive. An overview of the methodology is given in Farrow and Goldstein 



|(1993| ). In particular, as we are not forced to specify full prior measures over all variables 
of interest, we may exploit second-order exchangeability to allow us to construct statistical 
models directly from small numbers of belief specifications over observables. Foundational 
issues raised by Bayes linear analysis of exchangeable specifications are discussed in |Goldstein 



(1994|) . We now show that these methods offer a simple and tractable approach to covariance 



estimation, linking sample covariance matrices with their "population" counterparts, in a 
natural geometric setting. 



4 Exchangeable representations for covariances 

Let X.i, K.2i • • • be an infinite, second-order exchangeable sequence of random vectors, each of 
length r, namely a sequence for which X_ k = (X lfc , . . . , X rk ) T , ^ = E(X ik ), Cij = Cov(X ik , Xj k ) 
does not depend on k, and c£ - = Cov(X ik , Xjj), k ^ I does not depend on k, I. 

From this specification, we may use the second-order exchangeability representation the- 
orem flGoldstein 1986| ) to decompose X ik as 

X lk = M t + R lk (1) 

where E(R ik ) = Gov(Mj, R ik ) = Cov(R ik , Rji) = 0,\/i,j,k ^ I, and the vectors R k = 
(Rik, • • • , R r k) T form a second order exchangeable sequence. Here, Mj may be thought of as 
representing underlying population behaviour, and R ik as representing individual variation. 
Consider the sequence of r ( - r 2 h1 ^ -dimensional vectors 

Hfc = {RlkRlk-, • • • , RlkRrk-, R2kR2k, • • • , R2kRrk, R r kR r k) T (2) 
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representing the quadratic products of the residuals. Suppose that we assume that the Y_ k 
are second-order exchangeable, and that we express the additional specifications Vij pq = 
Cov (RikRjk, RpkRqk) and i^ - = Cov(RikRjk, RpiRqi), k ^ I. Then we may similarly decom- 
pose the elements of Y_ k as 

RikRjk = Vij + Uijk (3) 

with properties as for representation ([!]). In particular Cov(Uijk, U pq k) = Uij pq is not de- 
pendent on k. Here, Vy represents underlying covariance behaviour, and represents 
individual variation within the quadratic products of residuals. 

If we observe a sample X 1: . . . , X n of size n, then sample covariances take the form 

Xj.) (4) 
Rj.) (5) 

Beliefs over the sample covariances SV, are, by (|5|), uniquely determined by representation 
and can be written 

Sij = + Tij (6) 

where is as in (|3|), E(7 1 y) = and Cov(Vij, T^) = 0. The covariance structure over is 
given by 

Cov(T lJ ,T pq ) = ^. (7) 

Observing sample covariances from a sample of size n reduces uncertainty for V^, the un- 
derlying covariance values, but is uninformative for the U^k for k > n. 

Let S be the matrix whose (i, j)th element is S^, and define V and T similarly. We then 
have 

S = V + T (8) 



S, 



'j 



n 



n 



w 



1 

— - X! (Xi W — x, 

~~ w=l 

1 - 

— r /ARjw — Ri){R, 

-L 1 



5 Geometric representation for random matrices 

We now develop the representation which will allow us to treat a covariance matrix as a 
single object. Let B = [Bi, B 2 , . . .] be a collection of random r x r real symmetric matrices, 
representing unknown matrices of interest to us. These might, for example, represent pop- 
ulation covariance matrices. Let D = [Di,D2, . . .] be another such collection, representing 
observable matrices (such as sample covariance matrices). Finally, let C = \C\,Ci, . . .] be a 
basis for the space of constant r x r real symmetric matrices. We now form a vector space 

L = span{B U C U D} (9) 

of all linear combinations of the elements of these collections, and define the inner-product 
(over equivalence classes) on L as 

(P, Q) = E(Tr(PQ)) VP,QeL (10) 
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which induces the metric 



d(P,QY = E(\\P-Q\\ 2 F ) VP,QeL, (11) 

where || • \\p denotes the Frobenius norm of a matrix. This is the sum of the squares of the 
elements, or equivalently, the sum of the squares of the eigenvalues. Where necessary, we 
form the completion of the space. The (complete) inner-product space is denoted by M. 
Analogously with the revision of belief over scalar quantities flGoldstcin 1981 ), we learn 



about the elements of the collection B, by orthogonal projection into subspaces of M spanned 
by elements of the collection C U D, in order to obtain the corresponding adjusted expecta- 
tions, namely the linear combinations of sample covariance matrices which give our adjusted 
beliefs. 

If all matrices of interest contain only one non-zero component (all in the same position), 
the inner product becomes (P,Q) = E(PjjQy), inducing the distance d(P,Q) 2 = E((Py — 
Qij) 2 ), as for the usual Bayes linear theory for scalar quantities. The matrix structure is a 
generalisation of the scalar Bayes linear structure, and scalar Bayes linear adjustments can 
be recovered by decomposing all variance structures to the one component level. 

The matrices we are considering do not have to be finite dimensional. All of the theory 
remains valid if we think in terms of representations of random linear self-adjoint operators 
on a (possibly infinite-dimensional) vector space. 



6 Decomposing the variance structure 



As a simple example, B might consist only of the "population" covariance matrix, V for a 
particular problem, and D might be the corresponding sample covariance matrix, S, based 
on n observations. In this case, our adjusted expectation for the "population" matrix would 
be a weighted linear combination of the prior and sample covariance matrices. However, 
by breaking down the sample covariance matrix into its component sub-matrices, we may 
resolve a greater proportion of our uncertainty about the "population" covariance matrix. 

For simplicity, consider the case where we wish to learn about the covariance structure 
induced by representation (|3|) for 2-dimensional vectors. The covariance matrices will be 
2x2. Consider the sample covariance matrix 



S 



S\i S12 
Sl2 S22 



and the corresponding "population" covariance matrix 

V -- 

In the notation of the previous section, we could restrict ourselves to 
B = [V],D S =[S],C = 





(12) 



(13) 



(14) 
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where all 2 x 2 matrices can be constructed as linear combinations of the elements of C. 
Using these collections, our adjusted expectation for V given D$ would take the form 



E Ds {V) = (1 - a)E(V) + aS 



(15) 



where a is the coefficient of the orthogonal projection determined by the inner-product fllQ|) . 
Explicitly: 



a 



(V-E(V), V-E(V)) 
(S-E{S),S-E{S)) 

ELiEj = inVar(V^) 
E?=i E?=i{nVar(^) + Var(E/y)} 



(16) 
(17) 



However, to improve the precision of our estimates, our projection space could be enlarged 
by constructing 

" s u o W o s 12 W o o 
o o J ' I s n o r[o s 22 



(18) 



We call such a space the individual variance collection. This allows different sample covari- 
ances to have different weights, if for example, we have higher prior uncertainty about some 
of the variances. Indeed, we may take this a stage further, and construct 



D 



c 




(19) 



We call this last collection the complete variance collection. This not only allows the different 
covariances to have different weights, but also allows relationships between covariances to 
have an effect on the adjustment. If we project V into Dc, then our adjusted expectation 
for V will correspond precisely with the adjustment which would have been obtained using 
Bayes linear estimation on the quadratic products of the residuals in the scalar space. 

We can break down the population matrix in the same way if necessary. In particular, 
we let 

' ' V u \ ( V 12 \ f 
J ' I V 12 J ' { v 22 



Vi 



(20) 



As we enlarge the projection space, we resolve more of our uncertainty about the variance 
structures, at the expense of doing more work. Generally we should project into as rich a 
space as is practicable, but for large variance matrices, the difference both in computational 
effort and in effort required for prior specification, between adjusting by D$, Di and Dc is 
substantial, so that we must make a subjective assessment of the relative benefits of each 
adjustment. 
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7 Example 



7.1 Examination performance 

We are currently investigating the examination performance of first year mathematics un- 
dergraduate students at Durham university. We are particularly interested in those students 
who have only one A Level in mathematics, and so we restrict attention to these in our 
account. For illustrative purposes, we focus on a few key variables, namely a summary of A 
Level performance (A), performance in the Christmas exams (C), and the end of year exam 
average (E). 

For the exchangeable decomposition of (say) A k , we will write 

A k = M A + R Ak (21) 

and for the exchangeable decomposition of (say) RA k Rc k , we write 

R Ak R Ck = V AC + U ACk (22) 

so that, for example, V A c represents the underlying covariance between the A and C vari- 
ables, and U A c k represents the residual for the /cth observation. We construct the "popula- 
tion" and sample covariance matrices: 

' V AA V A e V A c \ I S AA S A e S A c \ 

V = V AE V E e V E c ? S = S AE S E e Sec (23) 
\ Vac V E c Vcc j \ Sac S E c Sec j 

A conditional linear independence graph flGoldstein 1990|) was formed to represent beliefs 
about the relationships between the quadratic products of the residuals (Figure ||). The com- 
mon variance node reflects beliefs about the positive correlation between variances. Covari- 
ances are influenced by the corresponding variances. This graph was used to help structure 
the belief specification over the mean components of the variance structure. 

Specifications are also required over the residual components of the variance structure. 
These specifications are more difficult to make, since we are not used to thinking about such 
quantities. In this example, for simplicity, our belief specifications over the residual structure 
were chosen to be consistent with those imposed under a multivariate normal specification 
corresponding to our prior specifications over the elements R^. Having made specifications 
over the quadratic products of residuals, beliefs over all relevant covariance matrices are now 
determined. 

From the sample covariance matrix, S = D$, we construct the individual variance col- 
lection, Di (6 objects) and the complete variance collection, Dc (36 objects), as well as the 
individual collection for the mean structure, Vj (6 objects). We form the random matrix 
space, M over all these objects, and investigate adjustments in this space. 
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Figure 1: A conditional linear independence graph for the mean components of the quadratic 
products of the residuals 



7.2 Quantitative analysis 

The prior covariance matrix was specified directly as follows: 



11.14 15.75 

E(V) = | 11.14 56.26 53.04 | (24) 
53.04 100.00 




The sample covariance matrix (34 cases) is: 

/ 8.28 20.15 24.75 
S = 20.15 178.30 160.74 j (25) 
V 24.75 160.74 258.26 

The adjusted matrices were formed as the appropriate linear combinations of the observables, 
as described in section 5, and derived explicitly for the simplest case in section 6. 

/ 8.08 14.08 18.69 \ 
E Ds (V) = 14.08 96.08 88.18 (26) 

V 18 - 69 88 - 18 15L65 / 
/ 8.04 15.96 17.72 \ 
E Dl (V) = 15.96 98.90 78.63 (27) 



E Dc (V) = 15.43 92.04 80.66 (28) 
V 20.06 80.66 156.79 / 

These adjusted matrices may be used as a basis for assessing our posterior beliefs about the 
matrix object ( |Goldstein 1994| ). 

7 



8.08 


14.08 


18.69 


14.08 


96.08 


88.18 


18.69 


88.18 


151.65 


8.04 


15.96 


17.72 


15.96 


98.90 


78.63 


17.72 


78.63 


159.21 


8.30 


15.43 


20.06 


15.43 


92.04 


80.66 


20.06 


80.66 


156.79 



Note that the last matrix fl28"|) represents the adjusted matrix which would have been 
obtained using a standard Bayes linear analysis on the quadratic products of the residuals. 
In this particular example, all adjusted matrices are positive definite. In general, we view 
negative eigenvalues in the revised structure as providing diagnostic warnings of possible 
conflicts between prior beliefs and the data. 

We would like to be able to compare the estimates of V: E Dc (V), E Ds (V), and E Dl (V). 
Thus, we use the standard interpretive and diagnostic features of the Bayes linear method- 
ology to assess the model and understand the adjustments taking place. 



7.3 Bayes linear influence diagram 

Figure |2] shows a Bayes linear influence diagram representing the adjustments and corre- 
sponding diagnostic information for the random matrices. Such diagrams are described in 
detail in |Goldstein, Farrow, and Spiropoulos (1993| ) for random quantities, with a similar 



interpretation for random matrices, where conditional linear independence is determined 
instead by the inner-product (|TUD, so that conditional linear independence becomes 

BUC/D ^ E[Tr{(5 - E D (B))(C - E D (C))}) = (29) 



as described in |Goldstein (1990|) 



The outer shadings of the V node represent proportions of uncertainty about V resolved 
by projection into the various spaces. Shadings start at 3 o'clock, and progress in an anti- 
clockwise fashion. The full circle represents the total uncertainty about the value of the 
covariance matrix. The first outer portion shaded represents the proportion of our uncer- 
tainty resolved by the sample covariance matrix alone (As)- By comparing this with the 
first shaded portion for the Vj node, we see that we have learned considerably more about 
the matrix object, than we have about the 6-dimensional space over the individual variance 
collection. 

The next shading gives the additional information gained by using the individual collec- 
tion as the projection space. We see that this tells us a great deal more about the elements 
of the Vj collection, but little about the matrix object as a whole. The other shading shows 
the additional uncertainty resolved due to including the complete variance collection in our 
projection space. We see that there is information to be gained by enriching our projection 
space, but we must balance information gained with extra effort involved. Whether or not we 
choose to include the complete variance collection will depend upon the size of the problem 
under consideration, and upon how much the answer really matters. 

Shadings in the centres of the nodes are diagnostics based on the size and bearing of the 
adjustments, as described in |Goldstein (1988|) . We generalise the bearing to the space of 



random matrices as follows: For any given constant matrix, G, and projection space D, the 
bearing is defined to be the unique random matrix, B, with the property 

(A - E(A), B) = (E d (A), G) - (E(A), G),VAeM (30) 
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Figure 2: Diagnostic influence diagram summarising changes in expectation of the matrix 
objects 
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where Ed (A) represents the realisation of Ed(A) after observing D = d. Different choices of 
the constant matrix, G, give information about different projections of the adjusted expec- 
tations. The choice of G which causes diagnostics to match up exactly with those for scalar 
Bayes linear adjustment in the case where we are dealing exclusively with one-component 
matrices, is the choice given by the constant matrix whose elements are all 1. At the centre 
of the V node, dark (light) shadings represent changes in expectation larger (smaller) than 
we expected a priori. We can see that adjusting by the sample covariance matrix, D$, 
caused a much larger change in expectation than we expected a priori. This is evidence that 
we were too confident about our ability to predict the true value of the covariance matrix, 
and suggests that we should re-examine the prior specification. We also notice that adding 
the complete variance collection, Dq, to the adjustment had the potential to change our 
expectation considerably, but in fact, hardly changed it at all. This is perhaps evidence that 
we overestimated the importance of the covariance terms. 



8 Summary 

Analysing matrices in a space where they live naturally not only has great aesthetic appeal, 
but is very powerful and illuminating in practice. Working in this space simplifies the 
handling of large matrices, by reducing the number of quantities involved and summarising 
effects over the whole covariance structure. For the same reasons, diagnostic information 
about adjusted beliefs is easier to interpret. We may decompose structures as much or as 
little as we wish. 

This approach allows us to learn about collections of covariance structures, and examine 
their relationships. It generalises the "element by element" approach to revision, which can 
be viewed as taking place in a subspace of the larger space. Exchangeability representations 
lie at the heart of the methodology: all of our specifications are over observables, or quantities 
constructed from observables, rather than artificial model parameters, and we make no 
distributional assumptions for the data or the prior. 
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